Overview:
- The Apache Avro is a framework used for data serialization and Remote Procedure Calls.
- Apache Avro stores the schema of the data along with serialized data, which improves the performance of the entire serialization-deserialization process.
- This article explains how to get Python objects back through de-serialization from the data file, which has the serialized data using Apache Avro.
De-serializing data into Python Objects:
- Using DataFileReader create a reader object by passing the file object corresponding to the data file and the DatumReader object as parameters.
- While the DataFileReader helps in reading the data file the DatumReader helps in de-serializing the data present in the file.
- Remember, the data file consists of data and the scheme of the data.
- The data can contain both primitive types and complex types.
Example:
# import the avro classes from avro.datafile import DataFileReader from avro.io import DatumReader
# Create the fileobject for the serialized data file fileObject = open("conference.avro", "rb")
# Read the file using DataFileReader and # deserialize using DatumReader dataReader = DataFileReader(fileObject, DatumReader())
# Print the conference details for conferenceDeatil in dataReader: print(conferenceDeatil)
dataReader.close() |
Output:
{'name': 'Virutal conference', 'date': 25612345, 'location': 'New York', 'speakers': ['Speaker1', 'Speaker2'], 'participants': ['Participant1', 'Participant2', 'Participant3', 'Participant4', 'Participant5'], 'seatingArrangement': {'Participant1': 1, 'Participant2': 2, 'Participant3': 3, 'Participant4': 4, 'Participant5': 5}} |